Statistical Acquisition of Terminology Dictionary

نویسندگان

Xuanjing Huang

Lide Wu

Wang Wen-xin

چکیده

Terminologies are specialized words and compound words used in a particular domain, such as computer science. Since they are very common in scientific articles, the ability to automatic identification of terminology could greatly assist any domain related natural language processing applications. Unfortunately, the collection of terminology information is very difficult and requires much tedious and time consuming manual work. In this paper, a semi-automatic approach is developed to extract technical words and phrases from on-line corpora. This approach can significantly reduce the manual effort in the generation of terminology dictionary. First, those domain specific words which have no entries in the universal dictionary are identified. Second, terminology words are extracted from these new words as well as the universal dictionary. Then compound words are extracted from the combination of terminology words and other words. The final computer terminology dictionary contains 1,034 words and 3,471 compound words. Experiment shows that 89.5 percent of all the occurrences of computer terminology can be identified with this terminology dictionary. keyword: Chi-square Test, Automatic Indexing, Mutual Information

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Terminology for Lexical Acquisition

Few attention has been paid to terminology extraction for what concerns the possibilities it offers to corpus linguistics and lexical acquisition. The problem of detecting terms in textual corpora has been approached in a complex framework. Terminology is seen as the acquisition of domain specific knowledge (i.e. semantic features, selectional restrictions) for complex terms and /or unknown wor...

متن کامل

Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons. 1 I n t r o d u c t i o n Reliable translation lexicons are useful in many applications, such as cross-langua...

متن کامل

Word Knowledge Acquisition for Computational Lexicon Construction

The growing of multilingual information processing technology has created the need of linguistic resources, especially lexical database. Many attempts were put to alter the traditional dictionary to computational dictionary, or widely named as computational lexicon. TCL’s Computational Lexicon (TCLLEX) is a recent development of a large-scale Thai Lexicon, which aims to serve as a fundamental l...

متن کامل

Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data consists of various domains, it is difficult for an SMT system to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explor...

متن کامل

Creating a medical dictionary using word alignment: The influence of sources and resources

BACKGROUND Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, IC...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Statistical Acquisition of Terminology Dictionary

نویسندگان

چکیده

منابع مشابه

Inducing Terminology for Lexical Acquisition

Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

Word Knowledge Acquisition for Computational Lexicon Construction

Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

Creating a medical dictionary using word alignment: The influence of sources and resources

عنوان ژورنال:

اشتراک گذاری